Real-Time Trajectory Prediction Method for Intelligent Connected Vehicles in Urban Intersection Scenarios

Intelligent connected vehicles (ICVs) have played an important role in improving the intelligence degree of transportation systems, and improving the trajectory prediction capability of ICVs is beneficial for traffic efficiency and safety. In this paper, a real-time trajectory prediction method based on vehicle-to-everything (V2X) communication is proposed for ICVs to improve the accuracy of their trajectory prediction. Firstly, this paper applies a Gaussian mixture probability hypothesis density (GM-PHD) model to construct the multidimension dataset of ICV states. Secondly, this paper adopts vehicular microscopic data with more dimensions, which is output by GM-PHD as the input of LSTM to ensure the consistency of the prediction results. Then, the signal light factor and Q-Learning algorithm were applied to improve the LSTM model, adding features in the spatial dimension to complement the temporal features used in the LSTM. When compared with the previous models, more consideration was given to the dynamic spatial environment. Finally, an intersection at Fushi Road in Shijingshan District, Beijing, was selected as the field test scenario. The final experimental results show that the GM-PHD model achieved an average error of 0.1181 m, which is a 44.05% reduction compared to the LiDAR-based model. Meanwhile, the error of the proposed model can reach 0.501 m. When compared to the social LSTM model, the prediction error was reduced by 29.43% under the average displacement error (ADE) metric. The proposed method can provide data support and an effective theoretical basis for decision systems to improve traffic safety.


Introduction
With the rapid development of 5G communication and intelligent connected vehicles (ICVs), the trajectory prediction of ICVs under a vehicle-to-everything (V2X) [1] system has become an important technology to improve the service level of ICVs [2][3][4]. Meanwhile, considering that transportation requires high levels of efficiency and safety, the accuracy and lower latency performance of the trajectory prediction methods need to be further improved [5][6][7].
The collision risk between vehicles existing in urban intersections can be reduced effectively by predicting the trajectory of ICVs [8,9]. The first step of trajectory prediction is target detection and tracking, which can provide reliable and microscopic multidimension data for the real-time prediction of ICV trajectories, and the target detection methods are mainly based on multi-sensor data fusion (MSDF) technology. As deep learning technologies are applied more and more widely within multilayer data coupling, vehicle perception technologies are also rapidly developing based on camera and light detection and ranging (LiDAR) sensors [10][11][12]. Jie et al. [13] proposed an optimal attribute fusion algorithm for target detection and tracking based on a Gaussian mixture probability hypothesis density 1.
We designed a real-time trajectory prediction method for ICVs, which combines the advantages of the Q-Learning algorithm and LSTM network with more consideration of spatiotemporal characteristics. We utilized the GM-PHD model to fuse the multisensor data output from the camera, LiDAR, V2X unit and traffic signal controller. Therefore, we not only enhanced positioning capabilities but also improved the capability of the trajectory prediction of the ICV;

2.
We improved the dimensionality of the input of an improved LSTM model by using microscopic data from V2X communication, such as speed, acceleration, and traffic light timing data. Meanwhile, the signal light factor was considered in the improved LSTM model, and the proposed trajectory prediction method had better performance at signal-controlled intersections; 3.
Different from most previous research results on vehicle trajectory prediction, we constructed an intelligent roadside unit for perceiving the data states of the ICVs, such as latitude, longitude, altitude, acceleration, and the trajectories of the ICVs, which could be predicted. Meanwhile, a practical urban intersection was selected for testing and evaluating the performance of the proposed model, obtaining a more credible result than the simulation.
The remainder of this paper is organized as follows. In Section 2, combined with V2X communication, an MSDF model based on GM-PHD and an improved LSTM model based on Q-Learning are presented. In Section 3, the experimental results of the proposed model are demonstrated and analyzed. Finally, In Section 4, the conclusions from this paper, along with the aspects of future work, are introduced.

Real-Time Trajectory Prediction Method for Intelligent Connected Vehicles
In this section, we present a real-time trajectory prediction method based on V2X communication, which is shown in Figure 1. Multisource data were obtained by the camera, LiDAR, V2X unit, and traffic signal controller as the input of the GM-PHD model to achieve ICV perception. When combined with the historical traffic state data, which had been preprocessed, the spatial-temporal trajectory information of the ICVs was obtained via an improved LSTM model. 2. We improved the dimensionality of the input of an improved LSTM model by using microscopic data from V2X communication, such as speed, acceleration, and traffic light timing data. Meanwhile, the signal light factor was considered in the improved LSTM model, and the proposed trajectory prediction method had better performance at signal-controlled intersections; 3. Different from most previous research results on vehicle trajectory prediction, we constructed an intelligent roadside unit for perceiving the data states of the ICVs, (such as latitude, longitude, altitude, acceleration, and the trajectories of the ICVs, which could be predicted. Meanwhile, a practical urban intersection was selected for testing and evaluating the performance of the proposed model, obtaining a more credible result than the simulation.
The remainder of this paper is organized as follows. In Section 2, combined with V2X communication, an MSDF model based on GM-PHD and an improved LSTM model based on Q-Learning are presented. In Section 3, the experimental results of the proposed model are demonstrated and analyzed. Finally, In Section 4, the conclusions from this paper, along with the aspects of future work, are introduced.

Real-Time Trajectory Prediction Method for Intelligent Connected Vehicles
In this section, we present a real-time trajectory prediction method based on V2X communication, which is shown in Figure 1. Multisource data were obtained by the camera, LiDAR, V2X unit, and traffic signal controller as the input of the GM-PHD model to achieve ICV perception. When combined with the historical traffic state data, which had been preprocessed, the spatial-temporal trajectory information of the ICVs was obtained via an improved LSTM model. The proposed method includes two parts: (1) a vehicle perception model based on GM-PHD; (2) a vehicle trajectory prediction model based on improved LSTM. Based on GM-PHD theory, we collected the ICVs state data, which were applied to improving the LSTM model. Q-Learning was then added to the LSTM to realize the real-time trajectory prediction of the ICVs. The proposed method includes two parts: (1) a vehicle perception model based on GM-PHD; (2) a vehicle trajectory prediction model based on improved LSTM. Based on GM-PHD theory, we collected the ICVs state data, which were applied to improving the LSTM model. Q-Learning was then added to the LSTM to realize the real-time trajectory prediction of the ICVs.

Vehicle Perception Model Based on GM-PHD
The perception model has two parts: (1) data preprocessing and the (2) GM-PHD model. The specific processing is shown in Figure 2. GM-PHD is a multiple object tracking (MOT) model that can adapt to fuse multi-sensor data and apply this to the situation, with varying numbers of the ICVs. The processing consisted of: (1) data preprocessing; (2) modeling; (3) initialization, and (4) state prediction and processing.

Vehicle Perception Model Based on GM-PHD
The perception model has two parts: (1) data preprocessing and the (2) GM-PHD model. The specific processing is shown in Figure 2. GM-PHD is a multiple object tracking (MOT) model that can adapt to fuse multi-sensor data and apply this to the situation, with varying numbers of the ICVs. The processing consisted of: (1) data preprocessing; (2) modeling; (3) initialization, and (4) state prediction and processing.  (1) Data preprocessing The image data were detected by the YOLOv5 algorithm to obtain information on the ICV states. When considering that the point cloud has a large amount of data, the VoxelGrid [29] filtering algorithm was selected to reduce the data load. Then, the targetlevel perception data of the multi-sensor were transformed by perspective-n-point (PNP) and camera calibration to a global co-ordinate system. The timestamps of the sensory data are aligned by linear interpolation, and the image data are matched with the V2X communication data by license plate number. In addition, the global nearest neighbor (GNN) algorithm was applied to fuse the data of the camera, LiDAR, and V2X, and the state of ICVs as output.
(2) The Modeling of ICVs The geodetic coordinate system was selected as the reference of the ICVs, with an xaxis along the road direction and a y-axis along the vertical road direction.
, +1 , where [x, y] indicates the vector of the vehicular position and [vx, vy] indicates the vector of the vehicle speed. ε k represents Gaussian white noise, which covariance follows the normal distribution N( , R), and Fk indicates the state transition matrix. The number of perceived ICVs at time k is defined as Ns(k), then all the observed ICVs at the intersection can be represented by the measurement data set (1) Data preprocessing The image data were detected by the YOLOv5 algorithm to obtain information on the ICV states. When considering that the point cloud has a large amount of data, the VoxelGrid [29] filtering algorithm was selected to reduce the data load. Then, the targetlevel perception data of the multi-sensor were transformed by perspective-n-point (PNP) and camera calibration to a global co-ordinate system. The timestamps of the sensory data are aligned by linear interpolation, and the image data are matched with the V2X communication data by license plate number. In addition, the global nearest neighbor (GNN) algorithm was applied to fuse the data of the camera, LiDAR, and V2X, and the state of ICVs as output.
(2) The Modeling of ICVs The geodetic coordinate system was selected as the reference of the ICVs, with an x-axis along the road direction and a y-axis along the vertical road direction. Measurement data [x, y, v x , v y , a x , a y , δ], which were acquired by V2X communication, were added to improve the accuracy of the tracking. We define N obj as the number of ICVs at time k and we define X k as the set of ICV states X k = {x 1,k , x 2,k , . . . , x i,k , . . . , x N obj (k),k }, where the state vector x i,k at time k constitutes of the position, velocity, and acceleration. The definition is shown in Equation (1), and the updating equation is defined in Equation (2).
where [x, y] indicates the vector of the vehicular position and [v x , v y ] indicates the vector of the vehicular speed. ε k represents Gaussian white noise, which covariance follows the normal distribution N(·, R), and F k indicates the state transition matrix. The number of perceived ICVs at time k is defined as N s (k), then all the observed ICVs at the intersection can be represented by the measurement data set Z k = z 1,k , z 2,k , . . . , z i,k , . . . , z N s (k) . The observed vector of the vehicle i state at time k is defined as z i,k , which contains perturbation, as shown in Equation (3). The observation equation of the sensors is shown in Equation (4).
where H k indicates the observation matrix of the linear system, and ς k indicates the Gaussian white noise observed by the sensor, which follows the distribution of N(·, R).
(3) Initialization of the GM-PHD parameters The ICVs and potential ICVs are represented by Gaussian components {w, m, P, ξ, n}, which denote the weights, the mean states, the covariance matrix, the number of Gaussian components, and the classification based on the GM-PHD [13] algorithm.
(4) ICV states prediction and processing The Kalman filter was applied to the GM-PHD algorithm to predict the Gaussian components, as shown in Equations (5)- (9). In the updated processing, the weights are updated by the observed ICVs state based on the current state w and the detection probability and martingale distance, as shown in Equation (10). Then, the Gaussian component is updated to obtain the new Gaussian component.
where v k−1 indicates the intensity function of the ICVs at time k−1, J k−1 indicates the number of Gaussian components, and N x; m i k−1 , P i k−1 indicates the distribution of i-th Gaussian components. w i k−1 , m i k−1 , P i k−1 indicate the weights, mean, covariance matrix of the distribution of Gaussian components, F k indicates the matrix of state transition, P i γ,k indicates the covariance matrix described by the distribution of v k−1 near the peak m i γ,k ; w i γ,k indicates the weights of the number of newborn ICVs; P D,k indicates the detection probability of the vehicle; v D,k indicates the posterior density of the detected ICVs; (1 − P D,k )v k|k−1 (x) indicates the intensity of the undetected ICVs; ∑ z∈Z k v D,k (x; z) indicates the intensity of the detected ICVs by the sensors, and γ k (x) indicates the newborn ICVs intensity at the intersection.
Moreover, a large number of computational resources can be consumed in complex scenarios with background noise, interference, and measurements. Therefore, we cite the method introduced by Lindenmaier [30] to prune the Gaussian components, and the accurate ICVs data states at the intersection were obtained. The pseudocode of GM-PHD algorithm is shown in Table 1.

35: w
between vehicle-to-vehicle, the Q-Learning algorithm was selected to gain the features of spatial dimension. LSTM was selected to gain the features of the temporal dimension. After the processing of merge and decoding, the trajectories of the ICVs could be predicted by the features. The structure of the improved LSTM model is shown in Figure 3.

Vehicle Trajectory Prediction Model Based on Improved LSTM
When combined with the states of the ICVs outputted by improved GM-PHD and signal light states, we applied graph modeling and an encoding unit before LSTM. The feature of V2X communication data, which can be acquired under the connected scenarios, is compressed to unify the feature dimensions. Then, considering the positional relationship between vehicle-to-vehicle, the Q-Learning algorithm was selected to gain the features of spatial dimension. LSTM was selected to gain the features of the temporal dimension. After the processing of merge and decoding, the trajectories of the ICVs could be predicted by the features. The structure of the improved LSTM model is shown in Figure

Graph Modeling and Features Encoding for Improved LSTM
The number of ICVs is defined as N, and each ICV is defined as a node from the graph. The node feature matrix X consists of position coordinates (x, y), velocity v, acceleration a, heading angle , body length L, body width W, and signal light factor T L , which is shown in Equation (11). The fixed co-ordinate system was selected to unify the co-ordinate system, and the x-axis direction of the ICVs is defined as the road direction, the yaxis is vertical to the x-axis, and the co-ordinate system obeys the right-handed system rule.
where T L indicates the signal light factor, which can be introduced as the remaining time of the red light when the ICV arrives at the next crosswalk maintaining constant velocity.

Graph Modeling and Features Encoding for Improved LSTM
The number of ICVs is defined as N, and each ICV is defined as a node from the graph. The node feature matrix X consists of position coordinates (x, y), velocity v, acceleration a, heading angle ϕ, body length L, body width W, and signal light factor T L , which is shown in Equation (11). The fixed co-ordinate system was selected to unify the co-ordinate system, and the x-axis direction of the ICVs is defined as the road direction, the y-axis is vertical to the x-axis, and the co-ordinate system obeys the right-handed system rule.
where T L indicates the signal light factor, which can be introduced as the remaining time of the red light when the ICV arrives at the next crosswalk maintaining constant velocity. The . . , n) of matrix X can be obtained from the V2X fusion perception trajectory information.
The adjacency matrix G of the graph is shown as Equations (12) and (13).
where g ij indicates the Euclidean distance between the vehicles. The heading angle of the ICVs can be directly obtained from the V2X perception data.
Both the input and output trajectory prediction data of the ICVs are shown in Equation (14).
where Λ indicates the mapping of the historical trajectory space to the prediction trajectory space, α in indicates the number of historical trajectory points, and β out indicates the number of predicted trajectory points at time t.

Prediction of ICVs Trajectory Based on the LSTM Model
In the time dimension, LSTM (with a deep structure) has the memory unit for storing historical time-series information, and the structure of the LSTM model is shown in Figure 4.
where gij indicates the Euclidean distance between the vehicles. The heading angle of the ICVs can be directly obtained from the V2X perception data. Both the input and output trajectory prediction data of the ICVs are shown in Equation (14).
x y x y x y x y x y x y in in out out 1 1 where Λ indicates the mapping of the historical trajectory space to the prediction trajectory space, α in indicates the number of historical trajectory points, and β out indicates the number of predicted trajectory points at time t.

Prediction of ICVs Trajectory Based on the LSTM Model
In the time dimension, LSTM (with a deep structure) has the memory unit for storing historical time-series information, and the structure of the LSTM model is shown in Figure  4. In Figure 4, the vehicle features, lane feature, and signal timing information are adopted as the input of the LSTM model. The input gates, forget gates, and output gates as the constraint control of ICVs, are provided by the model. Moreover, parts of trajectory features can be forgotten by forget gates, and the new features obtained by the Sigmoid function σ and hyperbolic tangent function tanh are added to the LSTM instead of the trajectory features that are discarded in the forgetting gates, as shown in Equations (15) and (16).
The calculation process of LSTM is summarized as follows: In Figure 4, the vehicle features, lane feature, and signal timing information are adopted as the input of the LSTM model. The input gates, forget gates, and output gates as the constraint control of ICVs, are provided by the model. Moreover, parts of trajectory features can be forgotten by forget gates, and the new features obtained by the Sigmoid function σ and hyperbolic tangent function tanh are added to the LSTM instead of the trajectory features that are discarded in the forgetting gates, as shown in Equations (15) and (16).
The calculation process of LSTM is summarized as follows:  (20), h t indicates the output of the LSTM in Equation (21).

Improved LSTM Based on Q-Learning
When combined with the feature of ICV spatial distribution, the Q-Learning algorithm was selected in this section to optimize the LSTM model. Q-Learning is the value-based Sensors 2023, 23, 2950 9 of 20 reinforcement learning algorithm, one of the key parameters Q(s, m) denotes the expectation that the benefit can be obtained by the action m ∈ M, and the corresponding reward can be the feedback, according to the action set M of the ICVs. The optimal route, which is stored in the Q-table, can be selected to obtain the maximum benefit action. The structure of Q-Learning is shown in Figure 5.

Improved LSTM Based on Q-Learning
When combined with the feature of ICV spatial distribution, the Q-Lea rithm was selected in this section to optimize the LSTM model. Q-Learning is based reinforcement learning algorithm, one of the key parameters Q(s, m) d expectation that the benefit can be obtained by the action m ∈ M, and the corr reward can be the feedback, according to the action set M of the ICVs. The opt which is stored in the Q-table, can be selected to obtain the maximum benefit structure of Q-Learning is shown in Figure 5. The Q-Learning algorithm can be integrated with the LSTM model for th of accurately predicting the ICV trajectory. Meanwhile, the road is numbere pattern, and each of the road grids is defined as a road node with red node n Figure 5. The processing of the algorithm is shown as follows: Step 1: Initialize the action value function Q(s, m); Step 2: A new action m is selected by the ICV, according to the Q-greedyU [31] and executing; Step 3: Reward r is received by the ICV, and a new state s + 1 is selected; Step 4: Updating the * ( , ) s m Q function; Step 5: Repeating Steps 2-4 until the ICV reaches the expectation states of The Q-Learning algorithm can be integrated with the LSTM model for the purpose of accurate predicting the ICV trajectory. Meanwhile, the road is coded in a grid pattern, and each of the road grids is defined as a road node with red node numbers in Figure 5. The processing of the algorithm is shown as follows: Step 1: Initialize the action value function Q(s, m); Step 2: A new action m is selected by the ICV, according to the Q-greedyUCB policy [31] and executing; Step 3: Reward r is received by the ICV, and a new state s + 1 is selected; Step 4: Updating the Q * (s, m) function; Step 5: Repeating Steps 2-4 until the ICV reaches the expectation states of ICV; Step 6: Output the last generated path scheme of the ICV. The updated of Q * (s, m) function is shown as Equation (22).
where µ (µ ∈ [0, 1]) indicates the learning rate of the Q-Learning algorithm, γ (γ ∈ [0, 1]) indicates the discount factor, which can make the algorithm pay more attention to the current or future reward, Q(s, m) indicates the current reward under the current state for the current action, Q * (s, m) indicates the desired maximum reward obtained by the ICVs. Generally, the ICVs may have five actions (straight ahead, left lane change, right lane change, left turn, and right turn) at an intersection, as shown in Figure 6.
indicates the discount factor, which can make the algorithm pay more attention to the current or future reward, Q(s, m) indicates the current reward under the current state for the current action, * ( , ) s m Q indicates the desired maximum reward obtained by the ICVs.
Generally, the ICVs may have five actions (straight ahead, left lane change, right lane change, left turn, and right turn) at an intersection, as shown in Figure 6. After action m is executed, if the ICV cannot reach the target grid, the Q-value is set as 0. Otherwise, the Q-value configurations are shown in Table 2. In addition, The Q-table of the ICVs at time t is shown in Equation (23).  The route with less time cost is defined as a better scheme for ICVs. In this section, the Q-greedyUCB algorithm [31] is selected as the action policy in the Q-Learning algorithm. In the processing of LSTM model training, five driving behaviors (straight ahead, left lane change, right lane change, left turn, and right turn) are considered to achieve trajectory prediction. The trajectories of the ICVs at the intersection are shown in Figure  7. After action m is executed, if the ICV cannot reach the target grid, the Q-value is set as 0. Otherwise, the Q-value configurations are shown in Table 2. In addition, The Q-table of the ICVs at initial time t is shown in Equation (23). The route with less time cost is defined as a better scheme for ICVs. In this section, the Q-greedyUCB algorithm [31] is selected as the action policy in the Q-Learning algorithm. In the processing of LSTM model training, five driving behaviors (straight ahead, left lane change, right lane change, left turn, and right turn) are considered to achieve trajectory prediction. The trajectories of the ICVs at the intersection are shown in Figure 7.
The weight matrix and offset vector of the vehicle features are obtained by training the LSTM model, and the loss function of the LSTM is shown in Equation (24).
where υ(t) indicates the predicted trajectory at time t, and τ indicates the parameters of the weight matrix and offset vector in the LSTM model. The trajectory prediction by LSTM needs to be optimized in combination with the Q-Learning algorithm. The loss function of Q-Learning combined with the LSTM is designed to fuse the vehicle trajectory behavior features and the driving features of the ICV, as shown in Equation (25). The trajectory prediction by LSTM needs to be optimized in combination with the Q-Learning algorithm. The loss function of Q-Learning combined with the LSTM is designed to fuse the vehicle trajectory behavior features and the driving features of the ICV, as shown in Equation (25).
where P lstm indicates the probability functions of the predicted trajectories in LSTM, Q ql indicates the probability functions of the predicted trajectories in Q-Learning, and D α indicates the α divergence. When considering the symmetry of D α , we set α as 0 to make the LSTM and Q-Learning prediction results as similar as possible. Finally, the loss function can be defined as Equation (26) by combining J 1 and J 2 .

Results and Discussion
In this section, a field test scenario for the ICVs was constructed based on an intelligent roadside unit, and the parameters of the model and scenario are listed in detail. Then, the evaluation metrics of GM-PHD and the improved LSTM model are introduced to verify and analyze the advanced of proposed model.

Scenario and Parameters
The ICVs and road infrastructure have real-time data-exchange capabilities via the V2X unit. DSMP (LTE-V communication protocol) was adopted by the roadside unit (RSU) to communicate with the on-board unit (OBU). Meanwhile, an intersection on the auxiliary road of Fushi Road in Shijingshan District, Beijing, was selected as the experimental scenario. The time of the experiments was selected between 7:00 and 19:30, and the saturation flow of the intersection was 319.04 pcu/h. The top view of the experimental scenarios is shown in Figure 8b. dicates the α divergence. When considering the symmetry of Dα, we set α as 0 to make the LSTM and Q-Learning prediction results as similar as possible. Finally, the loss function can be defined as Equation (26) where β (β ∈ [0, 1]) indicates the ratio of J2 in the final loss function.

Results and Discussion
In this section, a field test scenario for the ICVs was constructed based on an intelligent roadside unit, and the parameters of the model and scenario are listed in detail. Then, the evaluation metrics of GM-PHD and the improved LSTM model are introduced to verify and analyze the advanced of proposed model.

Scenario and Parameters
The ICVs and road infrastructure have real-time data-exchange capabilities via the V2X unit. DSMP (LTE-V communication protocol) was adopted by the roadside unit (RSU) to communicate with the on-board unit (OBU). Meanwhile, an intersection on the auxiliary road of Fushi Road in Shijingshan District, Beijing, was selected as the experimental scenario. The time of the experiments was selected between 7:00 and 19:30, and the saturation flow of the intersection was 319.04 pcu/h. The top view of the experimental scenarios is shown in Figure 8b. According to the survey of the selected scenarios, the experimental scenarios occupied 350 × 350 m 2 areas, which is marked in red rectangle block in Figure 8b, and the driving route of the ICVs is shown as an example. The driving route includes three types of driving behaviors: straight ahead, right turn, and left turn, where the green "△" indicates the origin point of the vehicle and the yellow "△" indicates the destination point of the driving. In addition, in order to verify the detection accuracy, we adopted the centimeter-level positioning data of the ICVs as the ground truth value.
Moreover, an intelligent roadside unit was deployed beside the intersection, which is equipped with a gigabit switch, cameras, LiDAR, V2X units, and mobile edge computing (MEC). A high-performance embedded processor with 30 Tops as the MEC device could ensure the speed and efficiency of algorithm execution. The intelligent roadside unit is shown in Figure 8a. The list of configurations is shown in Table 3. According to the survey of the selected scenarios, the experimental scenarios occupied 350 × 350 m 2 areas, which is marked in red rectangle block in Figure 8b, and the driving route of the ICVs is shown as an example. The driving route includes three types of driving behaviors: straight ahead, right turn, and left turn, where the green " " indicates the origin point of the vehicle and the yellow " " indicates the destination point of the driving. In addition, in order to verify the detection accuracy, we adopted the centimeter-level positioning data of the ICVs as the ground truth value.
Moreover, an intelligent roadside unit was deployed beside the intersection, which is equipped with a gigabit switch, cameras, LiDAR, V2X units, and mobile edge computing (MEC). A high-performance embedded processor with 30 Tops as the MEC device could ensure the speed and efficiency of algorithm execution. The intelligent roadside unit is shown in Figure 8a. The list of configurations is shown in Table 3. Table 3. List of configurations.

Parameters Description Values
Intelligent roadside unit and ICVs The number of predicted trajectories points β out 20

Evaluation Metrics
The performance of trajectory prediction is susceptible to the perceptional accuracy of the ICVs, and to evaluate the perceptional accuracy, the mean absolute percentage error (MAPE), the mean absolute error (MAE), and root mean square error (RMSE) are used in this paper, as shown in Equations (27)- (29). For the evaluation of the proposed prediction model, the average displacement error (ADE) and final displacement error (FDE) are adopted as the evaluation metrics. The ADE is the average Euclidean distance between the predicted trajectory and the real trajectory. FDE is defined as the Euclidean distance between the end-point of the predicted trajectory and the end-point of the actual trajectory. The ADE and FDE functions are shown in Equations (30) and (31).
FDE = x pred − x truth 2 + y pred − y truth 2 (31) where n indicates the number of vehicles, r indicates the prediction step, D i dist indicates the Euclidean distance between the actual and predicted coordinates of vehicle i, [x pred , y pred ] T indicates the end point of the predicted trajectory, and [x truth , y truth ] T indicates the end point of the actual trajectory.

Experimental Results and Analysis
There are three experimental ICVs, with a maximum speed of 35 km/h. The three ICVs track the route in Figure 8. The range of V2X communication between the RSU and the ICVs is considered to be 300 m. The real-time traffic states, including the ICV dataset and the signal light state dataset, can be perceived by intelligent roadside unit. There are 12,403 data states for the vehicles and 1151 data states for the signal light in the dataset; parts of the dataset are shown in Tables 4 and 5.

Accuracy of ICV Perception Analysis
A tested route was set for the ICVs, which is described in Figure 8b, and consequently, a series of perception data were recorded. The ICV perception results were perceived by the camera, LiDAR, V2X unit, the GM-PHD model, and the ground truth position, and the errors were selected to mark the map, which is shown in Figure 9.
The error distribution of the single-sensor model presents an irregular elliptical distribution, and it is more dispersed compared with the error distribution of the fused model. Thus, the vehicular detection information after fusion processing is closer to the real results, and the statistic of perception error is shown in Table 6.
The perceptional accuracy of the ICVs applying the GM-PHD model is more advantageous when compared with the single sensor. The maximum, minimum, and average error of the LiDAR has better performance when compared with the camera and V2X unit. When compared to LiDAR, the minimum error of the GM-PHD model is reduced by 86.58%. Moreover, the average error of the GM-PHD model is 0.1181 m, which is a 44.05% reduction compared to the LiDAR. By combining the data from multiple sources, the data noise is reduced, the outliers are eliminated, and the biases of each individual sensor data are corrected. Thus, when compared with the perception result of the single sensors, the perception results can be described more accurately by the fusion of the data from different sensors. Finally, the accuracy analysis of perception can further verify that the data obtained by GM-PHD has enough credibility. The error distribution of the single-sensor model presents an irregular elliptical distribution, and it is more dispersed compared with the error distribution of the fused model. Thus, the vehicular detection information after fusion processing is closer to the real results, and the statistic of perception error is shown in Table 6. The perceptional accuracy of the ICVs applying the GM-PHD model is more advantageous when compared with the single sensor. The maximum, minimum, and average error of the LiDAR has better performance when compared with the camera and V2X unit. When compared to LiDAR, the minimum error of the GM-PHD model is reduced by 86.58%. Moreover, the average error of the GM-PHD model is 0.1181 m, which is a 44.05% reduction compared to the LiDAR. By combining the data from multiple sources, the data noise is reduced, the outliers are eliminated, and the biases of each individual sensor data are corrected. Thus, when compared with the perception result of the single sensors, the perception results can be described more accurately by the fusion of the data from different sensors. Finally, the accuracy analysis of perception can further verify that the data obtained by GM-PHD has enough credibility.  In order to evaluate the performance of the GM-PHD model, we compared the GM-PHD model with the LSTM model [32], MV3D (Multi-View 3D) model [33], and RoarNet model [34]. The RMSE and MAE metrics were selected to evaluate the performance of the models. The comparison results are shown in Figure 10.
In Figure 10, In order to evaluate the performance of the GM-PHD model, we compared the GM-PHD model with the LSTM model [32], MV3D (Multi-View 3D) model [33], and RoarNet model [34]. The RMSE and MAE metrics were selected to evaluate the performance of the models. The comparison results are shown in Figure 10.

Advanced ICV Trajectory-Prediction Analysis
In order to evaluate the performance of the improved LSTM model for real-time trajectory prediction, the RNN encoder-decoder (RNN ED) model [35], the social LSTM [36], and social attention [37] method were selected for comparisons with the proposed model. In addition, this section analyzes the stability of the proposed model in different time periods with different traffic flows and analyzes the time latency in the trajectory prediction.
We deployed the intelligence roadside unit in the auxiliary road of Fushi Road, and the RNN ED, social LSTM, and social attention methods were adopted to predict the trajectory of the ICVs in the condition of driving behaviors (straight ahead, right turn, and left turn). Part of the trajectory prediction results (e.g., right turn) is shown in Figure 11.

Advanced ICV Trajectory-Prediction Analysis
In order to evaluate the performance of the improved LSTM model for real-time trajectory prediction, the RNN encoder-decoder (RNN ED) model [35], the social LSTM [36], and social attention [37] method were selected for comparisons with the proposed model. In addition, this section analyzes the stability of the proposed model in different time periods with different traffic flows and analyzes the time latency in the trajectory prediction.
We deployed the intelligence roadside unit in the auxiliary road of Fushi Road, and the RNN ED, social LSTM, and social attention methods were adopted to predict the trajectory of the ICVs in the condition of driving behaviors (straight ahead, right turn, and left turn). Part of the trajectory prediction results (e.g., right turn) is shown in Figure 11. In Figure 11, the trajectories of the ICVs that were predicted by the proposed model are shown as bold red lines, and the ground truth of the vehicle trajectories is shown as blue lines. When compared with the trajectory real-time prediction results of the RNN ED model, the social LSTM model, and the social attention model, the proposed model is closer to the actual driving trajectory of the ICV. The trajectory prediction results are statistically significant through repeated experiments. Under the FDE and ADE evaluation metrics, the results are shown in Figure 12. In Figure 11, the trajectories of the ICVs that were predicted by the proposed model are shown as bold red lines, and the ground truth of the vehicle trajectories is shown as blue lines. When compared with the trajectory real-time prediction results of the RNN ED model, the social LSTM model, and the social attention model, the proposed model is closer to the actual driving trajectory of the ICV. The trajectory prediction results are statistically significant through repeated experiments. Under the FDE and ADE evaluation metrics, the results are shown in Figure 12.
In Figure 11, the trajectories of the ICVs that were predicted by the proposed model are shown as bold red lines, and the ground truth of the vehicle trajectories is shown as blue lines. When compared with the trajectory real-time prediction results of the RNN ED model, the social LSTM model, and the social attention model, the proposed model is closer to the actual driving trajectory of the ICV. The trajectory prediction results are statistically significant through repeated experiments. Under the FDE and ADE evaluation metrics, the results are shown in Figure 12. In Figure 12, the error of the improved LSTM model under the FDE and ADE metrics is 0.845 m and 0.501 m, and the prediction error of the social LSTM under the ADE metric is 0.710 m. Therefore, when compared to the social LSTM model, the ADE of proposed model was reduced by 29.43%. Meanwhile, when compared with the social attention and social LSTM models, the prediction error of the improved LSTM model is smaller because the proposed model utilizes the intersection environment features, vehicle features, and V2X communication data. In summary, the proposed model can predict the ICV trajectory more accurately.
The system latency has an impact on the real-time performance of the system and the safety of ICVs at the intersection. In order to analyze the latency of the prediction model, the calculation latency of the improved LSTM model is shown in Figure 13. In Figure 12, the error of the improved LSTM model under the FDE and ADE metrics is 0.845 m and 0.501 m, and the prediction error of the social LSTM under the ADE metric is 0.710 m. Therefore, when compared to the social LSTM model, the ADE of proposed model was reduced by 29.43%. Meanwhile, when compared with the social attention and social LSTM models, the prediction error of the improved LSTM model is smaller because the proposed model utilizes the intersection environment features, vehicle features, and V2X communication data. In summary, the proposed model can predict the ICV trajectory more accurately.
The system latency has an impact on the real-time performance of the system and the safety of ICVs at the intersection. In order to analyze the latency of the prediction model, the calculation latency of the improved LSTM model is shown in Figure 13.  In Figure 13, the time interval of the fusion perception is 100 ms in the system prediction processing, and the trajectory prediction model needs 96 ms of processing time to predict the trajectory of the ICVs. The total latency for the perception and trajectory prediction needs 196 ms, which is marked as same color in adjacent time period. In the processing of fusion perception, a higher number of Gaussian components need to be calculated by the prune operation of GM-PHD, leading to a reduction in work efficiency. In the trajectory prediction processing, the parameter calculations and graph modeling parts of the improved LSTM model take a certain amount of time to increase the latency of the model. When applying pipeline technology, the computation processing of the trajectory prediction is carried out simultaneously with the next computation processing of the fused perception algorithm. Thus, when considering that the time of trajectory prediction 2 s is larger than a latency of 0.196 s, the total latency satisfies the requirement of real-time trajectory prediction. In Figure 13, the time interval of the fusion perception is 100 ms in the system prediction processing, and the trajectory prediction model needs 96 ms of processing time to predict the trajectory of the ICVs. The total latency for the perception and trajectory prediction needs 196 ms, which is marked as same color in adjacent time period. In the processing of fusion perception, a higher number of Gaussian components need to be calculated by the prune operation of GM-PHD, leading to a reduction in work efficiency. In the trajectory prediction processing, the parameter calculations and graph modeling parts of the improved LSTM model take a certain amount of time to increase the latency of the model. When applying pipeline technology, the computation processing of the trajectory prediction is carried out simultaneously with the next computation processing of the fused perception algorithm. Thus, when considering that the time of trajectory prediction 2 s is larger than a latency of 0.196 s, the total latency satisfies the requirement of real-time trajectory prediction.
In order to verify the effect of traffic flow on trajectory tracking and prediction over different time periods, three ICVs were continuously tested in the intersection scenarios. The errors of prediction of the ICVs at different volumes of traffic flow in different time periods are shown in Figure 14.
cessing of fusion perception, a higher number of Gaussian components need to be calculated by the prune operation of GM-PHD, leading to a reduction in work efficiency. In the trajectory prediction processing, the parameter calculations and graph modeling parts of the improved LSTM model take a certain amount of time to increase the latency of the model. When applying pipeline technology, the computation processing of the trajectory prediction is carried out simultaneously with the next computation processing of the fused perception algorithm. Thus, when considering that the time of trajectory prediction 2 s is larger than a latency of 0.196 s, the total latency satisfies the requirement of real-time trajectory prediction.
In order to verify the effect of traffic flow on trajectory tracking and prediction over different time periods, three ICVs were continuously tested in the intersection scenarios. The errors of prediction of the ICVs at different volumes of traffic flow in different time periods are shown in Figure 14. In Figure 14, the orange column of the histogram indicates the perception error, and the upper and lower edges indicate the maximum and minimum errors of detection. The blue color indicates the trajectory prediction error, and the green color indicates the traffic flow volume. By analyzing the data in Figure 14, the decrease in the accuracy of prediction is due to the increase in traffic flow. However, the average displacement error of trajectory prediction is still lower than 0.501 m, so the prediction model satisfies the requirements of high-precision trajectory prediction (of ICVs). In addition, the proposed method can be extended to other similar systems, such as high-speed highway monitoring systems and tunnel monitoring systems, et al., to monitor the collision risk of vehicles on highways and ensure vehicular safety in tunnels.

Conclusions
In this paper, we focused on the intelligent perception at an urban intersection and proposed a real-time vehicular trajectory prediction method based on V2X communication; the above method was applied to an urban intersection to further improve ICV realtime trajectory and real-time prediction capabilities. When combined with V2X data, we improved the LSTM model based on the Q-Learning algorithm; the vehicle trajectory behavior features and the ICVs driving features were fused to optimize the loss function. The experimental results demonstrated that the improved LSTM model achieved an average prediction error of 0.501 m, and the error was reduced by 29.43% when compared to the social LSTM model under ADE metrics and 26.03% under FDE, which could achieve the stable and real-time prediction of ICV trajectory at different time periods and under different traffic volume flows.
In the future, in order to construct a multidimensional dataset of an intersection scenario, real-time and accurate trajectories can be provided by our achievements. Meanwhile, a more complex model for trajectory prediction will be designed and applied in challenging scenarios, such as highways, tunnels, off-ramps, and roundabouts, to improve vehicular safety and urban traffic efficiency. Data Availability Statement: All data and models used during the study appear in this article.